Goto

Collaborating Authors

 static scene


REACT3D: Recovering Articulations for Interactive Physical 3D Scenes

arXiv.org Artificial Intelligence

Interactive 3D scenes are increasingly vital for embodied intelligence, yet existing datasets remain limited due to the labor-intensive process of annotating part segmentation, kinematic types, and motion trajectories. We present REACT3D, a scalable zero-shot framework that converts static 3D scenes into simulation-ready interactive replicas with consistent geometry, enabling direct use in diverse downstream tasks. Our contributions include: (i) openable-object detection and segmentation to extract candidate movable parts from static scenes, (ii) articulation estimation that infers joint types and motion parameters, (iii) hidden-geometry completion followed by interactive object assembly, and (iv) interactive scene integration in widely supported formats to ensure compatibility with standard simulation platforms. We achieve state-of-the-art performance on detection/segmentation and articulation metrics across diverse indoor scenes, demonstrating the effectiveness of our framework and providing a practical foundation for scalable interactive scene generation, thereby lowering the barrier to large-scale research on articulated scene understanding. Our project page is https://react3d.github.io/


Event-guided 3D Gaussian Splatting for Dynamic Human and Scene Reconstruction

arXiv.org Artificial Intelligence

Reconstructing dynamic humans together with static scenes from monocular videos remains difficult, especially under fast motion, where RGB frames suffer from motion blur. Event cameras exhibit distinct advantages, e.g., microsecond temporal resolution, making them a superior sensing choice for dynamic human reconstruction. Accordingly, we present a novel event-guided human-scene reconstruction framework that jointly models human and scene from a single monocular event camera via 3D Gaussian Splatting. Specifically, a unified set of 3D Gaussians carries a learnable semantic attribute; only Gaussians classified as human undergo deformation for animation, while scene Gaussians stay static. To combat blur, we propose an event-guided loss that matches simulated brightness changes between consecutive renderings with the event stream, improving local fidelity in fast-moving regions. Our approach removes the need for external human masks and simplifies managing separate Gaussian sets. On two benchmark datasets, ZJU-MoCap-Blur and MMHPSD-Blur, it delivers state-of-the-art human-scene reconstruction, with notable gains over strong baselines in PSNR/SSIM and reduced LPIPS, especially for high-speed subjects.


HybridGS: Decoupling Transients and Statics with 2D and 3D Gaussian Splatting

arXiv.org Artificial Intelligence

Generating high-quality novel view renderings of 3D Gaussian Splatting (3DGS) in scenes featuring transient objects is challenging. We propose a novel hybrid representation, termed as HybridGS, using 2D Gaussians for transient objects per image and maintaining traditional 3D Gaussians for the whole static scenes. Note that, the 3DGS itself is better suited for modeling static scenes that assume multi-view consistency, but the transient objects appear occasionally and do not adhere to the assumption, thus we model them as planar objects from a single view, represented with 2D Gaussians. Our novel representation decomposes the scene from the perspective of fundamental viewpoint consistency, making it more reasonable. Additionally, we present a novel multi-view regulated supervision method for 3DGS that leverages information from co-visible regions, further enhancing the distinctions between the transients and statics. Then, we propose a straightforward yet effective multi-stage training strategy to ensure robust training and high-quality view synthesis across various settings. Experiments on benchmark datasets show our state-of-the-art performance of novel view synthesis in both indoor and outdoor scenes, even in the presence of distracting elements.


GraspSplats: Efficient Manipulation with 3D Feature Splatting

arXiv.org Artificial Intelligence

The ability for robots to perform efficient and zero-shot grasping of object parts is crucial for practical applications and is becoming prevalent with recent advances in Vision-Language Models (VLMs). To bridge the 2D-to-3D gap for representations to support such a capability, existing methods rely on neural fields (NeRFs) via differentiable rendering or point-based projection methods. However, we demonstrate that NeRFs are inappropriate for scene changes due to their implicitness and point-based methods are inaccurate for part localization without rendering-based optimization. To amend these issues, we propose GraspSplats. Using depth supervision and a novel reference feature computation method, GraspSplats generates high-quality scene representations in under 60 seconds. We further validate the advantages of Gaussian-based representation by showing that the explicit and optimized geometry in GraspSplats is sufficient to natively support (1) real-time grasp sampling and (2) dynamic and articulated object manipulation with point trackers. With extensive experiments on a Franka robot, we demonstrate that GraspSplats significantly outperforms existing methods under diverse task settings. In particular, GraspSplats outperforms NeRF-based methods like F3RM and LERF-TOGO, and 2D detection methods.


Real-time SLAM Pipeline in Dynamics Environment

arXiv.org Artificial Intelligence

Inspired by the recent success of application of dense data approach by using ORB-SLAM and RGB-D SLAM, we propose a better pipeline of real-time SLAM in dynamics environment. Different from previous SLAM which can only handle static scenes, we are presenting a solution which use RGB-D SLAM as well as YOLO real-time object detection to segment and remove dynamic scene and then construct static scene 3D. We gathered a dataset which allows us to jointly consider semantics, geometry, and physics and thus enables us to reconstruct the static scene while filtering out all dynamic objects.


An Exploration of Neural Radiance Field Scene Reconstruction: Synthetic, Real-world and Dynamic Scenes

arXiv.org Artificial Intelligence

Traditional NeRF approaches can reconstruct both synthetic This project presents an exploration into 3D scene reconstruction and real-world scenes and new methods like Instant of synthetic and real-world scenes using Neural Neural Graphics Primitives [5] significantly speed up the Radiance Field (NeRF) approaches. We primarily take NeRF training process, however, these methods are limited advantage of the reduction in training and rendering time to scenes with static Objects. D-NeRF (Dynamic NeRF [7]) of neural graphic primitives multi-resolution hash encoding, extends traditional NeRF with time conditioning making it to reconstruct static video game scenes and real-world possible to reconstruct scenes with dynamic objects, however, scenes-comparing and observing reconstruction detail and the implementation of D-NeRF was limited to synthetic limitations. Additionally, we explore dynamic scene reconstruction scenes where ground truth camera parameters exist. Our goal using Neural Radiance Fields for Dynamic is to extend the implementation of D-NeRF to reconstruct Scenes(D-NeRF). Finally, we extend the implementation of real-world scenes with dynamic objects like dancing people. D-NeRF, originally constrained to handle synthetic scenes to also handle real-world dynamic scenes.


Unveiling Unexpected Training Data in Internet Video

Communications of the ACM

During training, the squared L2 error between the clean spectrogram and the predicted spectrogram is used as a loss function to train the network. At inference time, our separation model can be applied to arbitrarily long segments of video and varying numbers of speakers. The latter is achieved by either directly training the model with multiple-input visual streams (one for speaker), or simply by feeding the visual features of the desired speaker to the visual stream. For full details about the architecture and training process, see our full paper.15


Segmentation of static scenes

Classics

A wide range of segmentation techniques continues to evolve in the literature on scene analysis. Many of these approaches have been constrained to limited applications or goals. This survey analyzes the complexities encountered in applying these techniques to color images of natural scenes involving complex textured objects. It also explores new ways of using the techniques to overcome some of the problems which are described. An outline of considerations in the development of a general image segmentation system which can provide input to a semantic interpretation process is distributed throughout the paper.